Analysis of property crimes, specifically focusing on thefts and burglaries, across the 195 provinces of Peru for the years 2018 to 2023
This document provides a descriptive analysis of the citizen security module 600 for the years 2022 and 2023, including victimization rates, perception of insecurity, and home burglaries by department.
This Quarto document serves as a practical illustration of the concepts analyzed in data analysis. It is designed primarily for educational purposes, focusing on demonstrating analytical techniques rather than scientific rigor.
0.1 Introduction
Today more than ever, Latin America is the most violent region in the world, with the highest crime and homicide rates—a situation that was already critical even before the arrival of COVID-19 (Dammert, 2022; Hernández, 2019; Hernández & Heimark, 2020). Across the region, crime and its consequences affect an increasing number of people and have become a recurrent topic in political debates. However, despite the fact that no other country in the region reports as high a perception of insecurity as Peru (Kanashiro, 2018), the country is not particularly violent. Between 2022 and 2023, the homicide rate was 7.8 per 100,000 inhabitants (Estadística e Informática [INEI], 2024), placing the country in the more “peaceful” quartile of the region. However, there are signs that urban violence is increasing due to new criminal modalities, such as contract killings and extortions, among others.
This report provides a detailed analysis of property crimes, specifically focusing on thefts and burglaries, across the 195 provinces of Peru for the years 2018 to 2023. The analysis uses data from the National Observatory of Citizen Security, which can be accessed here.
The report employs quintile-based classification to categorize risk levels into five distinct categories. The provincial boundaries used in this analysis are from 2023 and were sourced from the National Institute of Statistics and Informatics (INEI), which can be found here by GEO GPS PERÚ. The data on property crimes is sourced from the National Observatory of Citizen Security, part of the Ministry of the Interior.
This analysis aims to offer a comprehensive view of the distribution and risk levels of property crimes across Peruvian provinces, providing insights into spatial patterns and trends over the past five years.
For further information about the datasets and methodologies used, you can refer to the official sources linked above.
Let’s load the necessary libraries before starting!!
Code
# Use pacman's function p_load to install and load the necessary libraries:# First, install the pacman package if it's not already installed# install.packages("pacman")pacman::p_load(dplyr, # Data manipulationclassInt, # Classification intervalsggplot2, # Data visualizationpurrr, # Functional programmingreadxl, # Reading Excel filestidyverse, # Collection of R packages for data sciencesf, # Simple features for spatial datahaven, # Read and write data from other statistical softwarerio, # Easy import and export of datajanitor, # Cleaning datasjmisc, # Miscellaneous utilities for data analysisglue, # String interpolationsjlabelled, # Handling labelled dataHmisc, # Miscellaneous functions for data analysisstringdist, # String distance calculationsreactable, # for interactive tablesplotly,RColorBrewer)
0.2 Loading Data
Code
# Load the dataset with provincial crime data from an Excel filedelitos<-read_excel("../3-procesados/provincias_delitos.xlsx")# names(delitos)
Code
# Load the provincial boundaries shapefile for Peru from the specified pathperu_prov<-st_read(glue("C:/Users/user/Desktop/week2_spatial/data/shp-provincias/Provincial INEI 2023 geogpsperu SuyoPomalia.shp"),# funcion para importar desde tu carpeta (directorio) y con extension .shp quiet =TRUE# Suppress messages during import)# names(peru_prov)# str(peru_prov)
1 Fuzzy merge
Code
# Extract the province names from the crime data and the shapefileprovincias_delitos<-delitos$PROV_HECHOprovincias_peru_prov1<-peru_prov$PROVINCIA
Code
# Function to find the best match for a province name using string distancefind_matches<-function(provincia, provincias_list){distances<-stringdist::stringdist(provincia, provincias_list, method ="jw")min_distance<-min(distances)best_match<-provincias_list[which.min(distances)]return(data.frame( provincia_delitos =provincia, mejor_coincidencia =best_match, distancia =min_distance))}
Code
# Apply the matching function to all provinces in the crime datamatches<-do.call(rbind, lapply(provincias_delitos, find_matches, provincias_list =provincias_peru_prov1))# matches # Verificamos
Code
# Identify provinces in the crime data that did not have a close matchthreshold<-0.1# Set the threshold for distance to consider a matchnew_provincias<-matches[matches$distancia>threshold, ]new_provincias
# Find provinces in the shapefile that are not present in the crime datanot_in_delitos<-setdiff(provincias_peru_prov1, provincias_delitos)not_in_delitos
[1] "NASCA" "PUTUMAYO"
Code
# Rename "NASCA" to "NAZCA" and remove "PUTUMAYO" from the shapefile dataperu_prov1<-peru_prov|>mutate(PROVINCIA =recode(PROVINCIA, "NASCA"="NAZCA"))|>filter(PROVINCIA!="PUTUMAYO")|>select(PROVINCIA, geometry)
1.0.1 Merging
Code
# Merge the crime data with the shapefile datageo_delitos<-merge(delitos, peru_prov1, by.x ="PROV_HECHO", by.y ="PROVINCIA", all =TRUE)# geo_delitos # Verificamos
Insecurity in Peru. Source: Infobae
The image above provides an overview of insecurity in Peru. For a detailed review of the dataset, you can consult the interactive version using the DT package:
# Número de intervalos deseadosK<-5# Variable que queremos clasificartheVar<-geo_delitos$total_delitos# Clasificaciones utilizando diferentes métodosei5<-classIntervals(theVar, n =K, style ="equal")q5<-classIntervals(theVar, n =K, style ="quantile")fj5<-classIntervals(theVar, n =K, style ="fisher")jc5<-classIntervals(theVar, n =K, style ="jenks")# Método personalizado para puntos de ruptura máximos (aproximado)find_max_breaks<-function(x, k){breaks<-quantile(x, probs =seq(0, 1, length.out =k+1), na.rm =TRUE)return(classIntervals(x, n =k, style ="fixed", fixedBreaks =breaks))}mb5<-find_max_breaks(theVar, K)# Método personalizado para HeadTailBreaks (requiere implementación)# Aquí se usa un enfoque simplificadohead_tail_breaks<-function(x){breaks<-quantile(x, probs =c(0, 0.1, 0.5, 0.9, 1), na.rm =TRUE)return(classIntervals(x, n =length(breaks)-1, style ="fixed", fixedBreaks =breaks))}ht<-head_tail_breaks(theVar)# Clasificación en un solo dataframeclassifiers<-list(ei5, q5, fj5, jc5, mb5, ht)names(classifiers)<-c("EqualInterval", "Quantiles", "FisherJenks", "JenksCaspall", "MaxBreaks", "HeadTailBreaks")# Extraer ADCM (Average Distance to Class Mean) - Usando medidas personalizadascalculate_adcm<-function(class_intervals, data){breaks<-class_intervals$brksclasses<-cut(data, breaks =breaks, include.lowest =TRUE)means<-tapply(data, classes, mean, na.rm =TRUE)distances<-sapply(levels(classes), function(level){mean(abs(data[classes==level]-means[level]), na.rm =TRUE)})mean(distances)}# Calcular ADCM para cada clasificadoradcm_values<-sapply(classifiers, calculate_adcm, data =theVar)# Crear DataFrame con los resultadosadcm_df<-data.frame( ADCM =adcm_values, Classifier =names(adcm_values))# adcm_df
3 Guardamos las 2 mejores estrategias de clasificación al dataframe
Code
# Función para asignar clasificacionesassign_classifications<-function(values, breaks){classified<-cut(values, breaks =breaks, include.lowest =TRUE, labels =FALSE)return(classified)}# Añadir las nuevas columnas al DataFrame# geo_delitos$Total_ei5 <- assign_classifications(geo_delitos$total_delitos,# ei5$brks) # Con Limageo_delitos$Total_q5<-assign_classifications(geo_delitos$total_delitos,q5$brks)geo_delitos$Total_mb5<-assign_classifications(geo_delitos$total_delitos,mb5$brks)# Verifica si hay NA en las nuevas columnas y el rango de los intervalos# summary(geo_delitos$Total_ei5) # Con Limasummary(geo_delitos$Total_q5)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 2.000 3.000 2.996 4.000 5.000
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 2.000 3.000 2.996 4.000 5.000
Code
library(dplyr)# Lista de columnas de clasificaciónindexList<-c("Total_q5", "Total_mb5")# Función para calcular la media de 'total_delitos' por grupoaggregator<-function(data, group_col){data|>group_by(across(all_of(group_col)))|>summarise(mean_delitosp =mean(total_delitos, na.rm =TRUE), .groups ="drop")}# Realizar la agregación y combinación de resultadosresults_q5<-aggregator(geo_delitos, "Total_q5")results_mb5<-aggregator(geo_delitos, "Total_mb5")# Mostrar los resultadosresults_q5results_mb5
# Definir los nombres de columnas originalesindexList<-c("Total_q5", "Total_mb5")# Crear nuevos nombres de columna agregando "_cat"newColNames<-paste0(indexList, "_cat")# Crear nuevas columnas en `geo_delitos` copiando los valores de las columnas originalesgeo_delitos[newColNames]<-geo_delitos[indexList]# Definir las nuevas etiquetas para los niveles de clasificaciónnewLabelsForLevels<-c("1_LowRisk", "2_MediumLowRisk", "3_MediumRisk", "4_MediumHighRisk", "5_HighRisk")# Reemplazar los valores en las columnas categóricas con las nuevas etiquetasgeo_delitos<-geo_delitos%>%mutate(across(all_of(newColNames), ~factor(., levels =1:5, labels =newLabelsForLevels)))# Eliminar la columna 'Country' si existe en el DataFrameif("Country"%in%names(geo_delitos)){geo_delitos<-geo_delitos%>%select(-Country)}# Mostrar las primeras filas del DataFrame actualizado# geo_delitos
3.1 Utilizaremos el método de los cuantiles
Code
# Definir la paleta de colores "PuRd" con el número de colores igual al número de categorías únicas en Total_q5_catnum_classes<-length(unique(geo_delitos$Total_q5_cat))custom_palette_all<-brewer.pal(n =num_classes, name ="PuRd")# Crear mapa con todas las categoríasmap_all_classifiers<-ggplot(geo_delitos)+geom_sf(aes(fill =factor(Total_q5_cat), geometry =geometry), lwd =0.05, color ="white")+scale_fill_manual(values =custom_palette_all)+# Usar la paleta para todas las categoríaslabs( title ="Mapa de delitos patrimoniales por categoría en el Perú 2018-2023", subtitle ="Distribución de delitos según las categorías de riesgo según el método de Quantiles", caption ="Fuente: SIDPOL-PNP.Observatorio Nacional de Seguridad Ciudadana")+theme_void()+theme( plot.title =element_text(size =20, hjust =0.5), plot.subtitle =element_text(hjust =0.5))#map_all_classifiers
Clasificación del riesgo de delitos patrimoniales entre 2018-2023
4 Mapa de Riesgo Alto (5_highrisk)
Aquí se filtran los datos para mostrar solo las áreas con riesgo alto
Code
# Crear paleta personalizada para riesgo altopalette_highrisk<-c("#FF6347")# Rojo tomate para riesgo altopalette_other<-c("#E0E0E0")# Gris claro para otras provincias# Crear mapa coroplético para riesgo altomap_highrisk<-ggplot()+# Provincias en grisgeom_sf(data =geo_delitos%>%filter(Total_q5_cat!="5_HighRisk"),aes(geometry =geometry), fill =palette_other, color ="black", lwd =0.05)+# Provincias de riesgo alto en rojo tomategeom_sf(data =geo_delitos%>%filter(Total_q5_cat=="5_HighRisk"),aes(fill =factor(Total_q5_cat), geometry =geometry), color ="black", lwd =0.05)+scale_fill_manual(values =palette_highrisk, name ="Clasificación por Quantiles", breaks =c("5_HighRisk"), labels =c("Riesgo Alto"))+labs( title ="Mapa de Riesgo Alto en el Perú 2018-2023", subtitle ="Distribución de delitos patrimoniales de riesgo alto según el método de Quantiles", caption ="Fuente: SIDPOL PNP. Observatorio de Seguridad Ciudadana")+theme_void()+theme( plot.title =element_text(size =20, hjust =0.5), plot.subtitle =element_text(hjust =0.5), legend.position ="right", # Posicionar la leyenda en la parte inferior legend.text =element_text(size =12), legend.title =element_text(size =14))#map_highrisk
Riesgo alto en delitos patrimoniales entre 2018-2023
5 Mapa para Riesgo Bajo (1_lowrisk)
Aquí se filtran los datos para mostrar solo las áreas con riesgo bajo
Code
# Crear paleta personalizada para riesgo bajo, alto y otras provinciaspalette_lowrisk<-c("#64B5F6")# Azul claro para riesgo bajopalette_other<-c("#E0E0E0")# Gris neutro para otras provincias# Crear mapa coroplético para riesgo bajomap_lowrisk<-ggplot()+# Provincias en gris neutrogeom_sf(data =geo_delitos%>%filter(Total_q5_cat!="1_LowRisk"),aes(geometry =geometry), fill =palette_other, color ="black", lwd =0.1)+# Provincias de riesgo bajo en azul clarogeom_sf(data =geo_delitos%>%filter(Total_q5_cat=="1_LowRisk"),aes(fill =factor(Total_q5_cat), geometry =geometry), color ="black", lwd =0.1)+scale_fill_manual(values =palette_lowrisk, name ="Clasificación por Quantiles", breaks =c("1_LowRisk"), labels =c("Riesgo Bajo"))+labs( title ="Mapa de Riesgo Bajo en el Perú 2018-2023", subtitle ="Distribución de delitos patrimoniales de riesgo bajo según el método de Quantiles", caption ="Fuente: SIDPOL-PNP. Observatorio Nacional de Seguridad Ciudadana")+theme_void()+theme( plot.title =element_text(size =20, hjust =0.5), plot.subtitle =element_text(hjust =0.5), legend.position ="right", # Posicionar la leyenda en la parte inferior legend.text =element_text(size =12), legend.title =element_text(size =14))#map_lowrisk
Riesgo bajo en delitos patrimoniales en el Perú entre 2018-2023
References
Dammert, L. (2022). Contra el populismo punitivo. Planeta Perú.
Hernández, W. (2019). Costos sociales de la victimización en américa latina: Percepción de inseguridad, capital social y percepción de la democracia. Latin American Research Review, 54(4), 835–853. https://doi.org/10.25222/larr.23